Automatic Differentiation

Tensorial differentiates scalar, vector, tensor, and packed-state functions in the tensor spaces of their arguments. For most code, start with gradient and hessian. The callable operator is the general form for higher derivatives.

Gradients and hessians

For a scalar-valued function of a vector input, gradient returns a vector and hessian returns a second-order tensor:

julia> a = @Vec [3.0, 4.0]2-element Vec{2, Float64}:
 3.0
 4.0
julia> φ(a) = 0.5 * (a ⊡ a)φ (generic function with 1 method)
julia> gradient(φ, a)2-element Vec{2, Float64}: 3.0 4.0
julia> hessian(φ, a)2×2 SymmetricSecondOrderTensor{2, Float64, 3}: 1.0 0.0 0.0 1.0

When an argument is a symmetric tensor, AD is performed on that symmetric tensor space:

julia> ε = symmetric(@Mat [0.02 0.01 0.0; 0.01 0.00 0.0; 0.0 0.0 -0.01])3×3 SymmetricSecondOrderTensor{3, Float64, 6}:
 0.02  0.01   0.0
 0.01  0.0    0.0
 0.0   0.0   -0.01
julia> K = 10.0; # bulk modulus
julia> G = 5.0; # shear modulus
julia> ψ(ε) = K/2 * tr(ε)^2 + G * (dev(ε) ⊡₂ dev(ε))ψ (generic function with 1 method)
julia> σ = gradient(ψ, ε);
julia> σ isa SymmetricSecondOrderTensor{3}true
julia> ℂ = hessian(ψ, ε);
julia> ℂ isa SymmetricFourthOrderTensor{3}true

The tensor space of the argument matters. If a value is symmetric only numerically, but its type is a general matrix tensor, the derivative is taken in the general matrix space:

julia> A = rand(Mat{3,3})3×3 Tensor{Tuple{3, 3}, Float64, 2, 9}:
 0.325977  0.894245  0.953125
 0.549051  0.353112  0.795547
 0.218587  0.394255  0.49425
julia> S = A * A'; # symmetric in value, but not typed as a symmetric tensor3×3 Tensor{Tuple{3, 3}, Float64, 2, 9}: 1.81438 1.253 0.894897 1.253 1.05904 0.65243 0.894897 0.65243 0.4475
julia> gradient(identity, S) ≈ one(FourthOrderTensor{3})true
julia> gradient(identity, symmetric(S)) ≈ one(SymmetricFourthOrderTensor{3})true

For more on symmetric tensor spaces, see Tensor Types and Spaces.

Returning the function value

Pass :all when you want derivatives and the function value from one call. For gradient, the return value is

(gradient(f, x), f(x))

and for hessian, it is

(hessian(f, x), gradient(f, x), f(x))
julia> gradient(φ, a, :all)([3.0, 4.0], 12.5)
julia> hessian(φ, a, :all)([1.0 0.0; 0.0 1.0], [3.0, 4.0], 12.5)

For the general operator ∂{N}, :all returns derivatives from higher to lower order, ending with the function value:

(∂{N}(f, x), ..., ∂{2}(f, x), ∂(f, x), f(x))

The general operator

For scalar input and scalar output, is the most direct spelling. ∂(f, args...) is equivalent to ∂{1}(f, args...). Use braces for higher orders:

julia> f(x) = x^4 + xf (generic function with 1 method)
julia> x = 2.02.0
julia> f(x)18.0
julia> ∂(f, x)33.0
julia> ∂{2}(f, x)48.0
julia> ∂{2}(f, x, :all)(48.0, 33.0, 18.0)
julia> ∂{3}(x -> x^5, x)240.0

The general operator is useful when the derivative order is part of the formula you want to write down. gradient(f, x) is ∂{1}(f, x), and hessian(f, x) is ∂{2}(f, x).

Multiple inputs

For multiple inputs, the first derivative is returned as a tuple whose entries follow the order of the inputs:

julia> gradient((x, y) -> x^2 + 3x*y + y^2, 2.0, 4.0)(16.0, 14.0)
julia> gradient((x, y) -> x^2 + 3x*y + y^2, 2.0, 4.0, :all)((16.0, 14.0), 44.0)

The result represents:

(∂f/∂x, ∂f/∂y)

With :all, it returns:

((∂f/∂x, ∂f/∂y), f(x, y))

Second derivatives for multiple inputs are returned as a block Hessian:

julia> hessian((x, y) -> x^2 + x*y + y^3, 2.0, 3.0)((2.0, 1.0), (1.0, 18.0))
julia> hessian((x, y) -> x^2 + x*y + y^3, 2.0, 3.0, :all)(((2.0, 1.0), (1.0, 18.0)), (7.0, 29.0), 37.0)

The Hessian block structure is

(
    (∂²f/∂x², ∂²f/∂x∂y),
    (∂²f/∂y∂x, ∂²f/∂y²),
)

The same block structure is used when the input types differ:

julia> A = symmetric(@Mat [1.0 0.2; 0.2 2.0])2×2 SymmetricSecondOrderTensor{2, Float64, 3}:
 1.0  0.2
 0.2  2.0
julia> d = gradient((x, A) -> x * tr(A), x, A)(3.0, [2.0 0.0; 0.0 2.0])
julia> d[1]3.0
julia> d[2] isa SymmetricSecondOrderTensor{2}true
julia> H = hessian((x, A) -> x * tr(A), x, A);
julia> H[1][2] isa SymmetricSecondOrderTensor{2}true
julia> H[2][2] isa SymmetricFourthOrderTensor{2}true

Multiple outputs

If f returns a tuple, each output is differentiated separately. The outer tuple follows the outputs:

julia> gradient(x -> (x^2, x^3), 2.0)(4.0, 12.0)
julia> gradient(x -> (x^2, x^3), 2.0, :all)((4.0, 12.0), (4.0, 8.0))

The result represents:

(∂f₁/∂x, ∂f₂/∂x)

With :all, it returns:

((∂f₁/∂x, ∂f₂/∂x), (f₁(x), f₂(x)))

Second derivatives are handled in the same way:

julia> hessian(x -> (x^2, x^3), 2.0)(2.0, 12.0)
julia> hessian(x -> (x^2, x^3), 2.0, :all)((2.0, 12.0), (4.0, 12.0), (4.0, 8.0))

The result represents:

(∂²f₁/∂x², ∂²f₂/∂x²)

and :all returns

(
    (∂²f₁/∂x², ∂²f₂/∂x²),
    (∂f₁/∂x, ∂f₂/∂x),
    (f₁(x), f₂(x)),
)

Multiple inputs and multiple outputs

When there are both multiple inputs and multiple outputs, the outer tuple follows the outputs, and the inner tuple follows the inputs:

julia> gradient((x, y) -> (x + y, x * y), 2.0, 3.0)((1.0, 1.0), (3.0, 2.0))
julia> gradient((x, y) -> (x + y, x * y), 2.0, 3.0, :all)(((1.0, 1.0), (3.0, 2.0)), (5.0, 6.0))

The result represents:

(
    (∂f₁/∂x, ∂f₁/∂y),
    (∂f₂/∂x, ∂f₂/∂y),
)

For second derivatives, each output carries its own block Hessian:

julia> hessian((x, y) -> (x + y, x * y), 2.0, 3.0)(((0.0, 0.0), (0.0, 0.0)), ((0.0, 1.0), (1.0, 0.0)))
julia> hessian((x, y) -> (x + y, x * y), 2.0, 3.0, :all)((((0.0, 0.0), (0.0, 0.0)), ((0.0, 1.0), (1.0, 0.0))), ((1.0, 1.0), (3.0, 2.0)), (5.0, 6.0))

The result represents:

(
    (
        (∂²f₁/∂x², ∂²f₁/∂x∂y),
        (∂²f₁/∂y∂x, ∂²f₁/∂y²),
    ),
    (
        (∂²f₂/∂x², ∂²f₂/∂x∂y),
        (∂²f₂/∂y∂x, ∂²f₂/∂y²),
    ),
)

With :all, the return value is

(
    second_derivatives,
    first_derivatives,
    function_value,
)

where second_derivatives has the block-Hessian structure shown above and first_derivatives has the output/input structure of gradient.

For automatic-differentiation docstrings, see Automatic differentiation API.